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All real networks are different, but many have some structural properties in common. There seems to be no 
consensus on what the most common properties are, but scale-free degree distributions, strong clustering, and 
community structure are frequently mentioned without question. Surprisingly, there exists no simple generative 
mechanism explaining all the three properties at once in growing networks. Here we show how latent network 
geometry coupled with preferential attachment of nodes to this geometry fills this gap. We call this mechanism 
geometric preferential attachment (GPA), and validate it against the Internet. GPA gives rise to soft communities 
that provide a different perspective on the community structure in networks. The connections between GPA and 
cosmological models, including inflation, are also discussed. 


I. INTRODUCTION 

One of the fundamental problems in the study of complex 
networks GH3 is to identify evolution mechanisms that shape 
the structure and dynamics of large real networks such as the 
Internet, the world wide web, and various biological and so¬ 
cial networks. In particular, how do complex networks grow 
so that many of them are scale-free and have strong cluster¬ 
ing and non-trivial community structure? The preferential 
attachment (PA) mechanism E®, where new connections 
are made preferentially to more popular nodes, is widely ac¬ 
cepted as the plausible explanation for the emergence of the 
scale-free structures (i.e. the power-law degree distributions) 
in large networks. PA has been empirically validated for many 
real growing networks mm using statistical analysis of a se¬ 
quence of network snapshots, demonstrating that it is indeed 
a key element of network evolution. Moreover, there is some 
evidence that the evolution of the community graph — a graph 
where nodes represent communities and links refer to mem¬ 
bers shared by two communities — is also driven by PA lfl3l . 

Nevertheless, PA alone cannot explain two other empir¬ 
ically observed universal properties of complex networks: 
strong clustering Dl and significant community struc¬ 
ture m. Namely, in synthetic networks generated by stan¬ 
dard PA, clustering is asymptotically zero m and there are 
no communities B3. To resolve the zero-clustering problem, 
several modifications of the original PA mechanism have been 
proposed (TIETI . To the best of our knowledge, however, 
none of these models capture all three fundamental proper¬ 
ties of complex networks: heavy-tail degree distribution, high 
clustering, and community structure. 

In social networks, the presence of communities, that might 
represent node clusters based on certain social factors such as 
economic status or political beliefs, is intuitively expected. A 
remarkable observation lfT5l [22l - (26l is that many other net¬ 
works, including food webs, the world wide web, metabolic, 
biochemical, and financial networks, also admit a reasonable 
division into informative communities. Since that discovery, 
community detection has become one of the main tools for the 
analysis and understanding of network data fT7ll27l . 

Despite an enormous amount of attention to community de¬ 


tection algorithms and their efficiency, there were very few at¬ 
tempts to answer a more fundamental question: what is the 
actual mechanism that induces community structure in real 
networks? For social networks, where there is a strong rela¬ 
tionship between a high concentration of triangles and the ex¬ 
istence of community structure (28], triadic closure l29l has 
been proposed as a plausible mechanism for generating com¬ 
munities (30l . It was also shown by means of a simple agent- 
based acquaintance model that a large-scale community struc¬ 
ture can emerge from the underlying social dynamics ED- 
There also exist other contributions in this direction, where 
proposed mechanisms and generative models are specifically 
tailored for social networks |[32l]35l . 

Here we show how latent network geometry coupled with 
preferential attachment of nodes to this geometry induces 
community structure as well as power-law degree distribu¬ 
tions and strong clustering. We prove that these universal 
properties of complex networks naturally emerge from the 
new mechanism that we call geometric preferential attach¬ 
ment (GPA), without appealing to the specific nature (e.g. 
social) of networks. Using the Internet as an example, we 
demonstrate that GPA generates networks that are in many 
ways similar to real networks. 


II. RESULTS 

A. Geometric Preferential Attachment 

In growing networks the concept of popularity that PA ex¬ 
ploits is just one aspect of node attractiveness; another im¬ 
portant aspect is similarity (36l . Namely, if nodes are sim¬ 
ilar (“birds of feather”), then they have a higher chance of 
being connected (“flock together”), even if they are not popu¬ 
lar. This effect, known as homophily in social sciences ED, 
has been observed in many real networks of various na¬ 
ture (31(39]. 

The GPA mechanism utilizes the idea that both popular¬ 
ity and similarity are important. We take the node birth time 
t = 1, 2,... as a proxy for node’s popularity: all other things 
being equal, the older the node (i.e. the smaller t ), the more 
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popular it is. The similarity attribute of node t is modeled 
by a random variable 9 t distributed over a circle S 1 that ab¬ 
stracts the “similarity” space. One can think of the similar¬ 
ity space as an image of a certain projection p : A —> S 1 
from a space of unknown or not easily measurable attributes 
(a 1 ,... ,a k ) G A of nodes. For social networks, these at¬ 
tributes could be political beliefs, education, and social status, 
whereas for biological networks, {a*} may represent chemi¬ 
cal properties of metabolites or geometric properties of pro¬ 
tein shapes. While the absolute value of the similarity coordi¬ 
nate 9 t = p(al, ..., a k ) does not have any specific meaning, 
the angular distance 9 s t = w — \k — \9 S — 9 t \ \ quantifies the 
similarity between two nodes. Upon its birth, a new node t 
connects to an existing node s < t if s is both popular enough 
and similar to t, that is if s l3 9 st is small, where /3 £ [0,1] is a 
parameter that controls the relative contributions of popularity 
and similarity. 

The described rule for establishing new connections admits 
a simple geometric interpretation which is very useful for an¬ 
alytical treatment of the model. Let us define the radial coor¬ 
dinate of node s at time s as r s = 2 In s, and let it grow with 
time, so that at time t > s it is r s (t) = /3r s + (1 - /3)r t . 
The distance x s t between two points in the hyperbolic plane 
of curvature K = —1 with polar coordinates ( r s (t),9 s ) and 
(rt,9 t ) is approximately 00] x st = r s (t) + r t + 2 In = 

( B.2-I3q \ 

2 st ) • Si nce for any given t, the sets of nodes s < t 

that minimize s^9 s t and x s t are the same, new nodes simply 
connect to the hyperbolically closest existing nodes. Note that 
the increase of the radial coordinate r s (t) decreases the effec¬ 
tive age of the node, and thus models the effect of popularity 
fading observed in many real networks ED. 

But how do new nodes find their positions in this similarity 
space? The main assumption of our model is that the hidden 
attribute space A of a growing network is likely to contain 
“hot” regions (e.g. of human activity), and that the hotter the 
region, the more attractive it is for new nodes. Hot regions can 
for instance represent some hot areas in science. When these 
regions are projected onto the similarity space S 1 , the hot¬ 
ness manifests itself by a higher node density, more scientists 
working in a hot area. The higher attractiveness of a hot region 
is then modeled by placing a new node in this region with the 
higher probability, the hotter this region is, i.e. the higher the 
node density in it. That is, new scientists are expected to begin 
their careers working in hot areas where many existing scien¬ 
tists are already active, versus jumping onto some obscure de¬ 
velopments that nobody understands. Therefore the higher the 
node density in a particular section of our similarity space S 1 , 
the higher the probability that a new node is placed in this sec¬ 
tion. Intuitively we would expect that this process should lead 
to heterogeneous distributions of node coordinates in the sim¬ 
ilarity space. This intuition is confirmed by empirical results: 
if we map real networks to their hyperbolic spaces 02]|43), 
we observe that the resulting empirical angular node density 
is not uniform (e.g. see Fig. 5(a)), and nodes tend to cluster 
into tight communities. In the Internet, for example, these 
communities are groups of Autonomous Systems belonging 
to the same country. 



FIG. 1: Geometric preferential attachment. At time t, a new node 
appears at distance rt from the center of the hyperbolic disk denoted 
by cross. Points p i and p 2 represent two potential locations of the 
new node, and the drop-shaped curves are the boundaries of the hy¬ 
perbolic disks D V1 (r t ) and D V2 (rt) of radius r t centered at p i and 
P 2 - Since similarity is attractive and D vi (rt) contains more nodes 
(five) than D lp2 (rt) (none), the new node is more likely to appear at 

9t = pi. 


There are many ways to implement this general idea. For 
a variety of reasons we found that the most natural and con¬ 
sistent one is as follows. First we define the attractiveness of 
any location <p £ S 1 for a new node t with radial coordinate r t 
as the number of existing nodes s < t lying in the hyperbolic 
disk D v (rt) of radius rt centered at (r t ,p). The higher the 
attractiveness of a location p, the higher the probability that a 
new node t will chose this location as its place 9 t = p in the 
similarity space. We refer to this mechanism as the geomet¬ 
ric preferential attachment (GPA) of nodes to the similarity 
space. This mechanism is illustrated in Fig. 1. 

The exact definition of the GPA model is: 

0. Initially the network is empty. New nodes t appear one 
at a time, t = 1 ,..., and for each t: 

1. The angular (similarity) coordinate 9 t of a new node t 
is determined as follows: 

(a) Sample pt ~ C/[0, 2n], i = 1,..., t, uniformly at 
random. The set of points ti = (rt, Pi), ■ ■ ■, U = 
(rt,pt) in the hyperbolic plane are the “candi¬ 
date” positions for the newborn node; 

(b) Define the attractiveness A t (pi) of the i th candi¬ 
date as the number of existing nodes that lie within 
hyperbolic distance r t from it; 
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(c) Set 6 t = <pi with probability 


n t (*) 


At(<Pi) + A 

E‘=i(A(^) + A)’ 


( 1 ) 


where A ^ 0 is a parameter, called the initial at¬ 
tractiveness. 


2. The radial (popularity) coordinate of node t is set to 
r t = 2Inf. The radial coordinates of existing nodes 
s < t are updated to r s (f) = j3r s + (1 — /3)rt- 


3. Node t connects to m hyperbolic ally closest existing 
nodes (if t ^ m, then node t connects to all existing 
nodes). 


The GPA model has thus three parameters: the number of 
links m established by every new node, the speed of pop¬ 
ularity fading /?, and the initial attractiveness A. A mo¬ 
ment’s thought shows that rn controls the average degree of 
the network, k = 2m. We prove in Methods that the model 
generates scale-free networks and /? controls the power-law 
exponent 7 . The initial attractiveness A controls the het¬ 
erogeneity of the angular node density, namely, the hetero¬ 
geneity is a decreasing function of A. When A —► 00 , the 
GPA model becomes manifestly identical to the homogeneous 
popularity x similarity (PS) model l36l . where the angular co¬ 
ordinate 9t of a new node t is sampled uniformly at random 
on [0, 27 t], Note, however, that in GPA, choosing a position 
in the similarity space is an active decision made by a node 
based on the attractiveness of different locations, as opposed 
to “passive” uniform randomness in PS. In standard PA, the 
initial attractiveness term is used to control the exponent of 
the power-law degree distribution 13 El- In what follows we 
show that in GPA, A controls certain properties of the com¬ 
munity size distribution. 

Figure 2 shows the simulation results for networks of size 
n = 10 3 generated by the GPA model with m = 3 (i.e. each 
new node connects to the three hyperbolically closest nodes), 
/? = 2/3, and different values of A. As expected, the smaller 
the value of A, the more heterogeneous the distribution of an¬ 
gular coordinates. To quantify the difference between the em¬ 
pirical distribution of the angular coordinates and the uniform 
distribution on [0, 2tt], we use the Kolmogorov-Smirnov (KS) 
statistic, one of the standard distances that measures the dif¬ 
ference between two probability distributions. Recall that the 
KS statistic p is defined as the maximum difference between 
the values of the empirical distribution F n {6) of the sample 
0 i,..., 6 n and the uniform distribution -Fmo, 27 r] ( 0 ) = 0/2n, 


p = max 

0G[O,27t] 


F n {6) 


e 

2ty 


( 2 ) 


The KS statistic as a function of A is shown in the bottom 
panel of Fig. 2. As expected, p( A) is a decreasing function of 
A. 


A=0.1 



Angular coordinate 0 



FIG. 2: GPA networks. Synthetic networks of size n = 10 3 gen¬ 
erated according to the GPA model with m = 3, /? = 2/3, and 
A = 0.1 (first row), A = 1 (second row), and A = 10 (third row). 
The right column shows the corresponding histograms of the angular 
nodes densities. The bottom panel plots the expected KS statistic p 
{2jl, as a function of A. For each value of A, p(A) is computed by av¬ 
eraging the KS statistics for 100 independently generated networks. 


B. Degree Distribution 


For each of the three networks depicted in Fig. 2, the statis¬ 
tical procedure for quantifying power-law behavior in empir¬ 
ical data proposed in l44l accepts the hypothesis that the net¬ 
work is scale-free. It estimates the lower cutoff for the scaling 
region as k min = 3, which is consistent with the minimum de¬ 
gree in the networks m = 3. Figure 3(a) shows a doubly loga¬ 
rithmic plot of the empirical degree distributions P(k) ~ fc -7 
along with the fitted power-law with exponent 7 = 2.5. 

These empirical results show that the degree distribution 
of a network generated by GPA appears to be a power-law. 
Moreover, quite unexpectedly, the power-law exponent 7 re¬ 
mains similar for different values of A. These results can be 
proved analytically (see Methods for details). Remarkably, for 
any value of A, the GPA model produces scale-free networks 
with the power-law degree distribution identical to the degree 
distribution in networks growing according to PA, and having 
power-law exponent 7 = 1 + 1 //?. 











4 



10 ° io 1 io 2 

Degree k 



Degree k 


FIG. 3: Degree distribution and clustering. Panel (a) shows the 
empirical complementary cumulative degree distribution functions 
(CCDF) P c {k) = J2y=k P(k') for the networks shown in Fig. 2 
versus the corresponding power-law fit. The average clustering co¬ 
efficient c(k) as a function of node degree k for these networks is 
shown in panel (b). The mean clustering c = 0.88 for all networks. 


C. Clustering Coefficient 

The concept of clustering li45l quantifies the tendency to 
form cliques (complete subgraphs) in the neighborhood of a 
given node. Specifically, the local clustering coefficient of 
node s is defined as the probability that two nodes s' and s", 
adjacent to s, are also connected to each other. Figure 3(b) 
shows the average value of the clustering coefficient c(k) for 
nodes of degree k as a function of k for the three networks 
in Fig. 2. Interestingly, clustering does not depend on A ei¬ 
ther (a proof is in the Methods), and scales approximately as 
fc -1 . This means that, on average, the nodes with higher de¬ 
gree have lower clustering, which is consistent with empirical 
observations of clustering in real complex networks mm. 
For all the three PGA networks, the mean clustering (the av¬ 
erage of the local clustering coefficients) is high, c = 0.88. 


D. Soft Communities 

The hyperbolic space underlying a network and the GPA 
mechanism of node appearance in that space naturally in¬ 
duce community structure and allow to detect communities 
in a very intuitive and simple way. A higher density of links 
within a community indicates that its nodes are more similar 
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FIG. 4: Statistics of the rescaled gaps. Top panel shows the values 
of rescaled gaps A6 / A9 C for three networks of size n = 10 4 gen¬ 
erated by the GPA model with A = 0.1 (left), A = 1 (middle), and 
A = 10 (right). The bottom panel shows the sample autocorrelation 
function of the series in the top panel. 


to each other than to the other nodes, because links connect 
only nodes located within a certain similarity distance thresh¬ 
old. All such densely linked nodes are thus close to each other 
in some area of the similarity space, meaning that the spatial 
node density is high in this area. Therefore a community be¬ 
comes a cluster of spatially close nodes, and the community 
structure is encoded in a non-uniform distribution of angular 
(similarity) coordinates of nodes. 

Following the approach in 1371 . let us consider the angular 
gaps Ad between consecutive nodes, and define a soft com¬ 
munity as a group of nodes separated from the rest of the net¬ 
work by two gaps that exceed a certain critical value A 9 C . 
If a network has a total number of n nodes, then the criti¬ 
cal gap A 0 C is defined as the expected value of the largest 
gap A 0(n) = max{ A#i,.... A 9 n }, where 9\,..., 6 n are dis¬ 
tributed uniformly at random on [0, 2n], The rationale behind 
this definition is that if nodes are distributed uniformly in the 
similarity space, and there are no communities, then we do 
not expect to find any pair of nodes separated by a gap larger 
than this A 9 C . The calculations in the Methods show that the 
critical gap is approximately 


A 9 C 


2i r Inn 
n 


(3) 


Figure 4 shows the statistics of the rescaled gaps A 9 /A 9 C 
for three GPA-generated networks of size n = 10 4 with 
A = 0.1,1, and 10. In the top panel, we can see the organiza¬ 
tion of nodes on the circle with many consecutive small gaps 
(A 9i < A 9 C ) indicating groups of similar nodes (communi¬ 
ties) separated by large gaps (A> A 9 C ) which constitute 
boundaries between communities, so-called “fault lines” 0. 
As expected, smaller values of A result into more heteroge¬ 
neous distribution of gaps with strong long range correlations. 
This effect is clearly visible in the bottom panel, where the 
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sample autocorrelation function is shown: the smaller the A, 
the slower the autocorrelation decays. 

Having a geometric interpretation of the community struc¬ 
ture, it is now easy to quantity how well communities are sep¬ 
arated from each other. For each community C, we define its 
separation from the rest of the network 5(C) as the rescaled 
average of two gaps Adi, A02 > A 0 C that separate C from its 
neighboring communities. 


5(C) 


A0i + A6» 2 
2A e c 


(4) 


The mean community separation, i.e. the expected separation 
of a community that a randomly chosen node belongs to, can 
then be computed as follows: 


tL c 

5 = V-5(C 2 ), (5) 

n 

i —1 

where n, is the size of community C,; and n c is total number 
of communities. The network metric 5 can also be viewed as 
a measure of narrowness (or specialization) of communities. 
For example, in scientific collaboration network, where nodes 
represent scientists and communities correspond to groups 
with similar research interests, 5 quantifies the degree of in¬ 
terdisciplinarity in the network. When 5 is large, the bound¬ 
aries between communities are sharp and each community fo¬ 
cuses on its narrow, specific topic. On the other hand, if 5 
is close to one, then the boundaries are blur, communities are 
wide spread, and the network is highly interdisciplinary. 

The difference in the stochastic behavior of the rescaled 
gaps in Fig. 4 suggests that the initial attractiveness A con¬ 
trols the mean community separation 5 in the GPA-generated 
networks. This is confirmed by simulation results shown in 
Fig. 5(c), where 5 is shown as a function of A. As expected, 
5(A) is a monotonically decreasing function, approaching 
one when A is large. 


E. The Internet 

To demonstrate the ability of the GPA mechanism to gener¬ 
ate graphs that are similar to real networks, and, in particular, 
to reproduce real non-uniform distributions of similarity node 
coordinates, we consider the Autonomous Systems (AS) In¬ 
ternet topology |[48l of December 2009. The network consists 
of N = 25910 nodes, ASs, and M = 63435 links that rep¬ 
resent logical relationships between ASs. We embed the AS 
Internet into its hyperbolic space, i.e compute the popularity 
and similarity coordinates {r^d,}, using HyperMap j43|, an 
efficient network mapping algorithm that estimates the latent 
hyperbolic coordinates of nodes. The network topology has a 
power-law degree distribution with 7 = 2.1 and average node 
degree k ~ 5. This automatically determines two out of three 
parameters of the GPA model: m = k/2 and /3 = 1/(7 — 1). 
In Methods, we explain how to infer the value of A from net¬ 
work data using the maximum likelihood method. Here we 
consider the snapshot of the AS Internet based on the first 




FIG. 5: AS Internet vs GPA networks. Panel (a) shows the his¬ 
togram of the angular (similarity) coordinates {#;} for the snapshot 
of the AS Internet consisting of the first n = 10 3 nodes. All {#,:} 
are inferred by HyperMap 11431 . Panel (b) compares the KS statis¬ 
tics for the Internet and synthetic networks generated by the GPA 
model (box plot) with 7 = 2.1 and A = 0.7. The central red mark 
is the median, the blue horizontal edges of the box are the 25 th and 
75 th percentiles, the black whiskers extend to the most extreme data 
points not considered outliers. The box plot is obtained from 100 
independent generated networks. Panel (c) shows the perfect match 
between real and synthetic values of the mean community separa¬ 
tion 0 - Error bars represent plus and minus one standard deviation. 
Panel (d) juxtaposes the empirical CCDF of the soft community sizes 
in the Internet against CCDFs obtained for the three GPA-generated 
networks. Panel (e) shows the temporal evolution of the maximum 
likelihood estimate Amle(£) for the AS Internet, where the node 
birth times are their ranks in the decreasing degree order. The yellow 
star corresponds to the considered snapshot with n = 10 3 nodes and 
Amle = 0.7. 


n = 10 3 nodes. The corresponding estimated value of the 
initial attractiveness is Ai nt = 0.7. 

Figure 5(a) shows the histogram of the angular node density 
for the AS Internet snapshot. We note that it is far from uni¬ 
form, which is a direct indication of the presence of soft com¬ 
munities. We quantify the degree of heterogeneity of the an¬ 
gular density by the KS distance from the uniform distribution 
(|2ji and juxtapose it against the KS distances computed for net¬ 
works generated by the GPA model with A = 0.7 (Fig. 5(b)). 
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The Internet value lies within the 25th and 75th percentiles 
of the synthetic values, which shows that the degrees of non¬ 
uniformity in the Internet and GPA networks are comparable. 
Fig. 5(c) compares the real network with its synthetic counter¬ 
part in terms of the expected mean community separation <0 
The GPA mechanism generates networks with S that match 
the Internet value very well. In Fig. 5(d), we compare the 
community size distributions in the Internet snapshot and pre¬ 
diction given by the GPA model. Whereas S for the Internet 
and GPA networks are essentially identical, the KS statistics 
and community size distributions are similar, but the match 
is not perfect. This effect is explained by the systematic bias 
present in the inferred values of the angular coordinates {61;}. 
Indeed, the HyperMap method first assumes that all angular 
coordinates are uniformly distributed over the similarity space 
S 1 , i.e. A = oo, and then perturbs them to maximize a cer¬ 
tain likelihood function. This “smoothes” the inferred angular 
node density and makes it more homogeneous than the true 
distribution. Nevertheless, although the inferred value of A is 
only an approximation for the true value, the GPA model still 
captures well the degree of heterogeneity in the real network. 

Finally we note that GPA defined in Eq. 0 admits an inter¬ 
esting interpretation that suggests a model extension that may 
be useful for real network analysis. The probability of a new 
node born at time t to chose the angular position :p, can be 
written as 

n t(l») =Pf t At ^] ( 6 ) 

where 

Pf = (At) + A and ^ = \ Yu At(7) 

Therefore the event of choosing a position on the circle can 
be understood as follows. With probability pj the new node 
is a follower and chooses its position according to pure GPA 
(A = 0). With the remaining probability 1 — pf the new node 
chooses its position uniformly at random among the t avail¬ 
able positions. We note that A controls pf, since (At) ~ 1. 
When A is constant, pf is also constant, and consequently 
there is always a fraction of nodes that are placed at random 
locations. At long times, these random nodes diminish the 
effect of pure GPA, and eventually the angular distribution 
of nodes become indistinguishable from a Poisson point pro¬ 
cess on the circle. We can then wonder whether a constant 
value of A is a realistic assumption for dealing with real net¬ 
works. In scientific citation networks, for example, when a 
new field of science is being formed, and not much work has 
yet been done in it, scientists may decide either to explore a 
new line of research within the field, or to follow one of the 
mainstream existing lines. The former case can be modeled by 
a random choice of the angular position, assuming that sub- 
fields are homogeneously distributed. The latter is modeled 
by the pure GPA term in Eq. 0 - However, there is a payoff 
that does not remain constant during the evolution of the field. 
At early times, the chances to find an interesting result that 
would be highly cited and followed by others are very high. 


At late times, the topic space is crowded and the chances to 
find something fundamentally new are very slim. Therefore, 
there is a higher incentive for scientists to take higher risks at 
early times. This can be modeled by pf increasing with time, 
converging to a value close to 1 as time grows to infinity. In 
turn, this means that A is a decreasing function of time, hav¬ 
ing a large value at the beginning of network evolution, and 
decreasing to small values afterwards. 

Unfortunately, measuring the temporal evolution of A in a 
real network is not yet possible because there currently exists 
no parametric theory describing such evolution that could be 
used for statistical inference of A. However, it is fairly easy 
to find an approximate value of A as a function of time as fol¬ 
lows. If timestamps of a real complex network are available, 
we can pretend that A is constant, and infer its value using 
the MLE techniques described in the Methods for subgraphs 
made of nodes that were born before a given time t, Amle(6)- 
This value can be thought of as a (possibly weighted) aver¬ 
age of A(i) in time window (0 ,t). By increasing the value 
of t, we can detect whether A is constant (if AmleW does 
not change with time, beyond statistical fluctuations), or a de¬ 
creasing function of time. Figure 5(e) shows Amle(6) for the 
AS Internet where the strong temporal dependence of A is ev¬ 
ident. 


in. DISCUSSION 

In summary, hyperbolic network geometry, combining pop¬ 
ularity and similarity forces driving network evolution, and 
coupled with preferential attachment of nodes to this ge¬ 
ometry (GPA), naturally yields scale-free, strongly clustered 
growing networks with emergent soft community structure. 
The GPA model has three parameters that can be readily in¬ 
ferred from network data. Using the AS Internet topology 
as example, we have seen that the GPA mechanism generates 
heterogeneous networks that are similar to real networks with 
respect to key properties, including key aspects of the com¬ 
munity size distribution and separation. The mean commu¬ 
nity separation, a new metric that quantifies the narrowness 
of communities in a network, is controlled in GPA by initial 
attractiveness A, which controls the power-law exponent in 
standard PA. 

In the context of the asymptotic equivalence between de 
Sitter causal sets and popularity x similarity (PS) hyperbolic 
networks established in ll49l . we note that A is conceptually 
similar to the cosmological constant A in Einstein’s equations 
in general relativity (GR), where it is also an additive term in 
the proportionality between the energy-momentum tensor and 
spacetime curvature. Causal sets lf50l |5H are random graphs 
obtained by Poisson sprinkling a collection of nodes onto (a 
patch) of a Lorentzian manifold; edges in these graphs con¬ 
nect all timelike-separated pairs of nodes. If there is no matter 
(empty spacetime) but there is only dark energy (positive A), 
then the solution of Einstein’s equations is the de Sitter space- 
time, and the main theorem in (42) states that the ensemble of 
PS graphs is asymptotically (n —> oo) identical to an ensem¬ 
ble of causal sets sprinkled onto de Sitter spacetime, which 
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is one of the three maximally symmetric, homogeneous and 
isotropic Lorentzian manifolds (the other two are Minkowski 
and anti-de Sitter spacetimes). In this context, the GPA model 
considered here is a model with cosmological constant A and 
matter. Modeled by high node density, this matter, as in GR, 
“attracts more matter,” thus increasing the spacetime curva¬ 
ture of which the node density is a proxy. Indeed the main 
feature of the model is that the higher the node density in a 
particular region of space, the more nodes will appear in this 
region later. The main difference with GR is that here we 
essentially have an analogy with only the 00-component of 
Einstein’s equations. One can envision that other components 
should describe the coupled dynamics of the similarity space 
and nodes in it. In case of scientific collaboration network, for 
example, that would be the co-evolution of science (space) it¬ 
self, and interests of scientists (node dynamics in this space). 
In the model considered here nodes do not move. Finding the 
laws of their spatial dynamics that may further strengthen the 
analogy with general relativity is a promising but challenging 
research direction. 

In that context, the decay of initial attractiveness A that we 
found in the Internet must be analogous to the decay of cos¬ 
mological constant A in modern cosmological theories. Cos¬ 
mic inflation [52l l53l is widely accepted as the most plausible 
resolution of many problems with the classical big bang the¬ 
ory, including the flatness problem, the horizon problem, and 
the magnetic-monopole problem. Inflation is an initial period 
of accelerated expansion of the universe during which gravity 
was repulsive. Inflation does not last long, and can be mod¬ 
eled as a time dependent cosmological “constant” A that ini¬ 
tially has a high value and then decays to zero. The analogies 
between GPA with decaying A and inflation go even further, 
producing similar outcomes as far as the spatial distribution of 
events is concerned. Indeed, cosmic inflation has the effect of 
smoothing out inhomogeneities so that once inflation is over, 
the universe is nearly flat, isotropic, and homogeneous, except 
for quantum fluctuations of the inflaton field. These fluctua¬ 
tions are the seeds of future inhomogeneities that we observe 
in the universe at scales smaller than 100 Mpc. In the GPA 
context, a high value of A has also a homogenizing effect. In¬ 
deed, if A is large, then pf is small, and new nodes chose their 
angular positions at random, producing a Poisson point pro¬ 
cess on the circle. Once A is small enough, we are left with 
a random distribution of points with Poisson fluctuations that, 
as in the universe, are the seeds of future communities in the 
network (galaxies in the universe), because once A is nearly 
zero, these initial fluctuations are reinforced by pure preferen¬ 
tial attachment. 


IV. METHODS 

A. Invariance of the degree distribution and clustering 

Here we prove that the degree distribution and clustering 
coefficient in the networks generated by the GPA model do 
not depend of the initial attractiveness A. Moreover, the de¬ 
gree distribution is power-law with exponent 7 = 1 + 1/0. 


The proof can be reduced to the proof for the homogeneous PS 
model j36l (Supplementary Information, Section IV). Con¬ 
sider a new node t, and let R t be the radius of a hyperbolic 
disk centered at this node such that t is connected to all nodes 
s < t that lie in this disc. Then the probability Pgpa(s>£) 
that nodes t and s < t in the GPA model are connected can be 
computed as follows: 

Pgpa(M) = P(z s t < Rt) 

/ r s (t) + r t -R t \ ( 8 ) 

= P(0 flt <2e-*-), 

where x st = r s (t) + r t + 2 In ^ is the hyperbolic distance 
between nodes s = ( r s (t),9 s ) and t = ( rt,0 t ) at time t. 
Using the total probability theorem. 


Z 

Pgpa(s, f) =J^P < 2e 

=£«*( 


s(t)+r t -R t 


®sti ^ 2e 


b(«)+t -t-rtt 


t = ti) P(f = tj) 


(9) 


n*(0, 


where U are the candidate positions generated at Step 1(a), 
and n t (i) are the corresponding acceptance probabilities (|TJ». 
Applying the total probability theorem with respect to node s, 
we have: 


Pgpa(s, t) = 

i=i i=i 


< 2e~ 


rsW+r t~Rt 

2 


- = %) P(s = Sj)n t (i) 


2=1 7 = 1 

( 10 ) 

Since the angular coordinates of the candidate positions Sj 
and ti are uniformly distributed on [0, 2n\, the probability 
P {Og.f. ^ a) is simply a/ it. Therefore, 


o t s 

PcpA(s,f) =—e ^ y]n t (j)y]n s (j) 

n *=1 i=! ( 11 ) 

2 r- s (t) + r t -R t 

= —e 

7T 

where the last equality holds because = 

= 1. We note that Pgpa(s,<) does not de¬ 
pend on A, and that it is exactly the same as the probability 
Pps(s, t) of having a link between nodes t and s < t in the 
homogeneous PS model. The rest of the proof repeats the 
proof in Phi without a change. This leads to 

(s)- 0 

Pgpa(M) = Pps(M) = Ppa (s,t) = m y g —, 

fi (f) P da 

( 12 ) 

which means that the resulting degree distribution in GPA is 
identical to PA: it is the power-law with exponent 7 = 1+1/0. 
Since the connection probability PcPA(s,f) does not depend 
on A, neither does clustering. 












8 


B. Critical gap 


To obtain a closed-form expression for the critical gap, we 
note that for large n, the sequence 9±,... ,9 n ~ U[ 0, 2n] can 
be approximately viewed as a realization of the Poisson point 
process on the circle of unit radius with density A = n/2ir. In 
this case, the distribution of the angular gaps is approximately 
exponential with rate A. The maximum gap A0( n ) has then 

the following PDF /a 6 > w (x) = ^e~^ x (l — e~^ x ) n \ 
and its expected value can be calculated as follows: 


a e c = 

2tt 


xe~^ x (l-e~^ x ) 

= — 2n f y 11 - 1 ln(l — y)dy 
Jo 


dx 


n OO 

=2ir f y n - l Y j V —d y = 2^Y J 


(13) 


fc=l 


“ k(n + k) 


27r7T„ 


27r(lnn + 7) 27rln? 


where H n is the n th harmonic number, and 7 is Euler’s con¬ 
stant. 


C. Inference of A 

The initial attractiveness A controls the distribution of an¬ 
gular coordinates 0\..... 9 n of the nodes. We therefore first 
infer 9, using the HyperMap method l43l . Given the network 
embedding {(77, 0i)}" =1 into its hyperbolic space, the likeli¬ 
hood function C(A\9 \,..., 9 n ) can be written as follows: 

c(A\e 1 ,...,o n ) = p(e 1 ,...,o n \A) 

= P(6 >! | A)P(021 a, 9 1 )... P(0„ | A, 9 U ..., 0„_i) 

2ir 

(A 2 (9 2 ) + A)dip! 

A 2 (9 2 ) + A 2 (p 1 ) + 2A X (14) 

0 

2n 27r 

( A n (9 n ) + A)dy?i... dipn-i 
A n (9 n ) + A n (ipi) + nA 




True A 0 0.2 0.5 0.7 1 2 

Amle(IOO) 0 0.3 0.4 0.7 0.7 1.4 

Amle(200) 0 0.2 0.5 0.8 1.3 1.8 

A M le(500) 0 0.2 0.6 0.7 1.1 1.9 

Amle(IOOO) 0 0.2 0.5 0.7 1.1 1.8 

TABLE I: Maximum likelihood estimates. True values of the initial 
attractiveness parameter A and its MLEs Amle(tio) based on the 
first no = 100, 200, 500, and 1000 nodes. In all simulations, N = 
100 Monte Carlo samples were used in (| 1 6|>. 



Initial attractiveness A initial attractiveness A 


FIG. 6: Log-likelihood functions. The estimated log-likelihood 
functions l(A\9i ,..., 9„) for synthetic networks of size n = 10 3 
generated by the GPA model with A = 0, 0.2, 0.5, 0.7,1, and 2. 
Each log-likelihood is estimated by © using N = 100 Monte 
Carlo samples and no = 500 first nodes. 


j = 1 ,N. The “truncated” samples (fii \ ..., will 
be used for estimating the (t — 1)-dimensional integral in 
(151. Next, precompute all needed attractivenesses, A t ( 
where t = 2 ,..., n and i = 1,..., t — 1. Then for each value 
of A, the log-likelihood can be estimated as follows (up to a 
constant): 


where A t (ip) is the attractiveness of location ip G S 1 , that is 
the number of existing nodes at time (t — 1) that lie within 
distance r t from (r t , p). The log-likelihood is then (up to an 
additive constant): 


l(A\9 u ...,9 n ) = 




t—2 


{At(9 t ) + A)dpi... dpt—i (1^) 
A t (9 t ) + X«=i A t (pi) + tA 


The multiple integrals in (15 1 cannot be calculated analyti¬ 
cally, since the attractiveness function cannot be written in 
closed-form. Nevertheless, the log-likelihood can be effi¬ 
ciently estimated be the Monte Carlo method. First, gen¬ 
erate N Monte Carlo samples, <p^\ ■ .., Pn-i ~ (A[0, 2tt], 


l(A\9 1 ,...,9 n )^ 

^ / 1 A t (9 t ) + A \ (16) 

~ hi ° S \ N hi MOt) + A t (p[ j) ) + tAj 

Computing attractivenesses of the Monte Carlo samples 
At{Pi^) involves computing 0(n 3 N) hyperbolic distances, 
which is the most computationally intensive part of the al¬ 
gorithm. Having all attractivenesses computed, we can then 
estimate 1(A) for any A G [0 : A A : A max ], and find the 
maximum likelihood estimate (MLE) Amle- An important 
observation that drastically improves the efficiency of the al¬ 
gorithm is that we do not have to use the entire network to 
accurately estimate Amle, the first no 7C n nodes are often 
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enough. Table 1 shows the MLEs Amle(^o) obtained from 
the first no = 100, 200,500, and 1000 nodes of the networks 
generated by the GPA model with A = 0, 0.2,0.5, 0.7,1, and 
2. The corresponding log-likelihood functions are shown in 
Fig. 6. These simulation results show that the smaller the true 
value of A — and we expect it to be small in real networks 
since most of them have community structure — the less net¬ 
work data we need to pin Amle down. If, for example, A = 0, 
then the MLE of A based on the first no = 100 nodes is al¬ 
ready zero. The larger the true value of A, however, the flatter 
the log-likelihood is around its maximum, which makes infer¬ 
ence more challenging. 
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